This Quarto document provides a reproducible workflow for analyzing public transit service equity in a Canadian city. The primary goal is to investigate how the level of public transit service, measured by frequency, varies across different income groups and throughout a typical weekday.
The methodology is adapted from the 2022 paper Exploring the Time Geography of Public Transport Networks with the gtfs2gps package by Pereira, R.H.M., Andrade, P.R. & Vieira, J.P.B.
The analysis culminates in two key visualizations:
A 2D plot illustrating the average transit service frequency for each income decile across 10-minute intervals over a 24-hour period.
A 3D representation of the same plot, offering a more intuitive grasp of the peaks and valleys in service provision.
By parameterizing inputs such as the city name, census codes, and data files, this document is designed to be a flexible tool for anyone interested in conducting similar equity analyses across Canada.
2 Setup and Configuration
This section handles all the initial setup required for the analysis. It configures directories, validates input files, sets up the necessary Java environment for the routing engine, and loads all required R packages.
Both of the following files should be downloaded to the data-raw directory.
GTFS: Visit your chosen city’s open data portal or transit agency’s website to download the General Transit Feed Specification for your chosen study area.
GHS-BUILT-C: Visit https://human-settlement.emergency.copernicus.eu/download.php?ds=builtC to download the GHS-BUILT-C spatial raster for the chosen study area. This raster delineates the boundaries of human settlements at a 10m resolution, and is later used to more accurately model the spatial distribution of the study population.
Code
# Validate GTFS fileif (!fs::file_exists(gtfs_raw)) {stop("GTFS file not found at: ", gtfs_raw, "\n","Ensure the file '", params$gtfs_file, "' exists in the '", dir_raw_data, "' directory." )} else {message("GTFS file located: ", gtfs_raw)}# Validate GHS_BUILT_C raster fileif (!fs::file_exists(ghs_built_c_raw)) {stop("GHS-BUILT-C raster file not found at: ", ghs_built_c_raw, "\n","Ensure the file '", params$ghs_built_c_file, "' exists in the '", dir_raw_data, "' directory." )} else {message("GHS-BUILT-C raster file located: ", ghs_built_c_raw)}
2.4 Install and Setup Java Environment for r5r
The r5r package requires Java Development Kit (JDK) 21. The rJavaEnv package provides a way to install it from within R.
Code
if (!require("rJavaEnv")) install.packages("rJavaEnv")# java_quick_install() will ensure the specified version of Java is available.rJavaEnv::java_quick_install(version =21)# Verify the installation by checking which Java version rJava will use.rJavaEnv::java_check_version_rjava()
[1] "21"
Code
# Set Java memory BEFORE loading r5r.# The default is 512MB. Adjust "12" based on your system's available RAM.options(java.parameters ="-Xmx12G")
2.5 Load Packages
Code
# Data manipulation and spatial datalibrary(data.table)library(sf) library(sfheaders) library(dplyr) library(terra)library(raster)# GTFS processinglibrary(gtfstools) library(gtfs2gps) # R5R processinglibrary(r5r)library(osmextract)# Census datalibrary(cancensus) # Utilitieslibrary(pbapply) library(stringr) library(readr)library(progressr)library(rJavaEnv)# Plotting and Visualizationlibrary(ggplot2) library(viridis) library(patchwork) library(rayshader) library(mapview)# Parallel Computinglibrary(future)library(doParallel)
3 Data Acquisition and Initial Processing
This section loads the primary datasets required for the analysis. We will acquire the administrative boundary for the specified city, the GTFS data, the GHS-BUILT-C raster, and the OpenStreetMap (OSM) network.
3.1 Study Area Definition
First, we retrieve the Census Subdivision (CSD) boundary for the city defined in the parameters. This boundary will serve as the spatial extent for our analysis.
Code
city_bound_sf <- cancensus::get_census(dataset = params$census_dataset,regions =list(CSD = params$csd_code),level ="CSD",geo_format ="sf",use_cache =TRUE)# Transform to WGS84 coordinate system (EPSG:4326)city_bound_sf <- sf::st_transform(city_bound_sf, 4326)# Define the path for saving the boundary datapath_city_bound_rds <- fs::path(dir_city_data, paste0(city_prefix, "_bound_CSD.rds"))# Save the boundary data to the city-specific processed data folderreadr::write_rds(city_bound_sf, path_city_bound_rds)
Visualize Study Area A map is generated below to confirm that the correct boundary has been retrieved.
3.2 Transit Data (GTFS)
Next, we load the GTFS file for the city’s transit agency and filter it to include only the services running on the specific day of the week chosen for the analysis (e.g., ‘wednesday’).
Code
# Load GTFS Datacity_gtfs <- gtfstools::read_gtfs(path = gtfs_raw)# Filter GTFS by the specified analysis daycity_gtfs_filtered <- gtfstools::filter_by_weekday(city_gtfs, weekday = params$analysis_day)
3.3 Built-Up Area Data (GHS-BUILT-C)
We load the GHS-BUILT-C raster file, which is a dataset that identifies built-up areas. We will later use this to distinguish residential areas from non-residential ones (e.g., industrial areas, parks).
Finally, we download the OpenStreetMap street network data for the region containing our city. This file will be used by the r5r routing engine to calculate walking times from residential areas to transit stops.
Code
path_pbf_full <- fs::path(dir_raw_data, paste0(params$city_name, "_full.osm.pbf"))# Download the full regional street network data if it doesn't already exist.if (!fs::file_exists(path_pbf_full)) {message("Regional street network file not found. Downloading...") osmextract::oe_get( city_bound_sf,provider ="geofabrik",quiet =FALSE,download_only =TRUE,download_directory = dir_raw_data,filename =paste0(params$city_name, "_full.osm.pbf") )}
4 Building the Analytical Datasets
With the raw data loaded, this section details the steps required to process and integrate these sources into a final, unified dataset for analysis.
4.1 Creating a Uniform Analysis Grid
To analyze spatial patterns of population, income, and transit service, we will create a grid that covers the entire city boundary. This grid will become the fundamental unit of our analysis.
Code
# Set the projected CRS from parameters for accurate distance calculations in metersprojected_crs <-as.numeric(params$wgs84_utm_code)# Project the city boundary to the specified UTM zonecity_bound_proj <- sf::st_transform(city_bound_sf, projected_crs)# Define target area for each hexagon, similar to H3 level 9 (~0.11 km^2) target_area_m2 <-110000# Calculate the required cell size for st_make_grid based on the target areatarget_cellsize <-sqrt(2* target_area_m2 / (3*sqrt(3))) *sqrt(3)# Create a hexagonal grid covering the bounding box of the projected city boundary grid_poly <- sf::st_make_grid( city_bound_proj,cellsize = target_cellsize,what ="polygons",square =FALSE)# Convert the grid to an sf object and add a unique ID for each hexagon city_hex_grid_sf <- sf::st_sf(hex_id =1:length(grid_poly), geometry = grid_poly)city_hex_grid_sf <- sf::st_set_crs(city_hex_grid_sf, projected_crs)# Clip the grid to the precise city boundary city_hex_grid_proj <- sf::st_intersection(city_hex_grid_sf, city_bound_proj)# Select only the hex_id and geometrycity_hex_grid_proj <- dplyr::select(city_hex_grid_proj, hex_id, geometry)
4.2 Identifying Residential Areas
Not all land within a city is residential. To improve the accuracy of our analysis, we need to distinguish populated residential lands from industrial areas, parks, airports, and other non-residential zones. By doing so, we ensure that when we later add census data, we are allocating population and income information only to the areas where people actually live.
This process involves using the GHS-BUILT-C raster as a filter on the Census Dissemination Area (DA) polygons.
Code
# 1. Load Dissemination Area (DA) Boundaries# Retrieve DA geometries for the city and project them to the local UTM CRS.city_da_sf <- cancensus::get_census(dataset = params$census_dataset,regions =list(CSD = params$csd_code),level ="DA",geo_format ="sf",use_cache =TRUE)city_da_proj <- sf::st_transform(city_da_sf, crs = projected_crs)# 2. Align GHS Raster with Local CRS# Temporarily reproject DA boundaries to match the GHS raster's CRS (Mollweide)city_da_mollweide <- sf::st_transform(city_da_proj, sf::st_crs(ghs_raster))# Crop the global raster to the extent of the cityghs_raster_cropped <- terra::crop(ghs_raster, city_da_mollweide, snap ="out")# Project the cropped raster to our local UTM CRS using nearest neighbor resamplingghs_raster_proj <- terra::project(ghs_raster_cropped, y = terra::crs(city_da_proj), method ="near")# 3. Filter and Vectorize Residential Polygons# This process isolates the residential parts of each DA.# Create a boolean mask where residential cells (value == 1) are TRUE.ghs_residential_mask <- ghs_raster_proj ==1ghs_residential_mask[ghs_residential_mask ==0] <-NA# Set non-residential to NA# Rasterize the DA polygons, using the unique GeoUID as the cell value.da_id_raster <- terra::rasterize(x = city_da_proj,y = ghs_residential_mask,field ="GeoUID",touches =TRUE)# Use the residential mask to filter the DA ID raster. Only cells that are# residential will retain their DA GeoUID.final_raster_to_vectorize <- terra::mask(da_id_raster, ghs_residential_mask)# Vectorize the result back into polygons. 'dissolve = TRUE' groups all cells# with the same GeoUID into a single multipolygon for that DA.residential_polygons_sf <- terra::as.polygons( final_raster_to_vectorize,dissolve =TRUE,na.rm =TRUE) %>% sf::st_as_sf() %>% dplyr::rename(GeoUID =1) # Rename the ID column# Join the original attributes from the DA data frame back to the new polygonsda_attributes <- sf::st_drop_geometry(city_da_proj)residential_polygons_sf <- dplyr::left_join(residential_polygons_sf, da_attributes, by ="GeoUID")# Clean up large intermediate objects to free memoryrm(ghs_raster, ghs_raster_proj, ghs_residential_mask, da_id_raster, final_raster_to_vectorize, da_attributes, city_da_mollweide, ghs_raster_cropped)gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 4350614 232.4 8096730 432.5 6296908 336.3
Vcells 59993247 457.8 101797938 776.7 70570727 538.5
Visualize Residential Area Filtering To make the result of this spatial filtering clear, the map below overlays the new, smaller residential-only polygons (in red) on top of their original, larger Dissemination Area boundaries (in grey).